El objetivo de nuestro trabajo es estudiar si existe alguna relación entre la vacuna BCG (Bacillus de Calmette y Guérin) para la tuberculosis y los datos de mortalidad de la COVID-19 en algunos países, ya que se hay estudios que sugieren esta vacuna incrementa las capacidades inmunitarias de la población, hecho que se ve en el número reducido de fallecimientos por COVID-19 en ciertos países. Mediante los conjuntos de datos de BCG y de mortalidad por COVID-19 cedidos por The BCG world atlas y por BCG - COVID-19 AI Challenge de Kaggle, vamos a intentar desvelar dichas relaciones.
Los ficheros en cuestión son del tipo `csv, así que son fácilmente importables a data frames enR:
# Cargamos ambos datasets. Añadir explicación de qué contienen.
BCG_strain <-
read_csv("task_2-BCG_strain_per_country-1Nov2020.csv")
COVID_noformat <-
read_csv(
"task_2-COVID-19-death_cases_per_country_after_fifth_death-till_22_September_2020.csv"
)
# Intenté ver que hay dentro de los data frames, pero el print es feo así que lo escribiré a mano
# str(COVID_noformat)
# str(BCG_strain)
El contenido de las variables BCG_strain y COVID_noformat es el siguiente:
| BCG_strain | COVID_noformat |
|---|---|
| country_name | country_name |
| country_code | alpha_3_code |
| mandatory_bcg_strain_2015-2020 | date_first_death |
| mandatory_bcg_strain_2010-2015 | date_fifth_death |
| mandatory_bcg_strain_2005-2010 | deaths_per_million_10_days_after_fifth_death |
| mandatory_bcg_strain_2000-2005 | deaths_per_million_15_days_after_fifth_death |
| mandatory_bcg_strain_1990-2000 | deaths_per_million_20_days_after_fifth_death |
| mandatory_bcg_strain_1980-1990 | deaths_per_million_25_days_after_fifth_death |
| mandatory_bcg_strain_1970-1980 | deaths_per_million_30_days_after_fifth_death |
| mandatory_bcg_strain_1960-1970 | deaths_per_million_35_days_after_fifth_death |
| mandatory_bcg_strain_1950-1960 | deaths_per_million_40_days_after_fifth_death |
| vaccination_timing_unified | deaths_per_million_45_days_after_fifth_death |
| BCG Atlas: Which year was vaccination introduced? | deaths_per_million_50_days_after_fifth_death |
| Year of changes to BCG schedule | deaths_per_million_55_days_after_fifth_death |
| BCG Atlas: BCG Recommendation Type | deaths_per_million_60_days_after_fifth_death |
| BCG Atlas: Details of changes | deaths_per_million_65_days_after_fifth_death |
| BCG Atlas: Timing of 1st BCG? | deaths_per_million_70_days_after_fifth_death |
| BCG Atlas: BCG Strain | deaths_per_million_75_days_after_fifth_death |
| BCG Atlas: How long has this BCG vaccine strain been used? | deaths_per_million_80_days_after_fifth_death |
| deaths_per_million_85_days_after_fifth_death | |
| deaths_per_million_90_days_after_fifth_death | |
| deaths_per_million_95_days_after_fifth_death | |
| deaths_per_million_100_days_after_fifth_death | |
| deaths_per_million_105_days_after_fifth_death | |
| deaths_per_million_110_days_after_fifth_death | |
| deaths_per_million_115_days_after_fifth_death | |
| deaths_per_million_120_days_after_fifth_death | |
| deaths_per_million_125_days_after_fifth_death | |
| deaths_per_million_130_days_after_fifth_death | |
| deaths_per_million_135_days_after_fifth_death | |
| deaths_per_million_140_days_after_fifth_death | |
| deaths_per_million_145_days_after_fifth_death | |
| deaths_per_million_150_days_after_fifth_death | |
| stringency_index_10_days_after_fifth_death | |
| stringency_index_15_days_after_fifth_death | |
| stringency_index_20_days_after_fifth_death | |
| stringency_index_25_days_after_fifth_death | |
| stringency_index_30_days_after_fifth_death | |
| stringency_index_35_days_after_fifth_death | |
| stringency_index_40_days_after_fifth_death | |
| stringency_index_45_days_after_fifth_death | |
| stringency_index_50_days_after_fifth_death | |
| stringency_index_55_days_after_fifth_death | |
| stringency_index_60_days_after_fifth_death | |
| stringency_index_65_days_after_fifth_death | |
| stringency_index_70_days_after_fifth_death | |
| stringency_index_75_days_after_fifth_death | |
| stringency_index_80_days_after_fifth_death | |
| stringency_index_85_days_after_fifth_death | |
| stringency_index_90_days_after_fifth_death | |
| stringency_index_95_days_after_fifth_death | |
| stringency_index_100_days_after_fifth_death | |
| stringency_index_105_days_after_fifth_death | |
| stringency_index_110_days_after_fifth_death | |
| stringency_index_115_days_after_fifth_death | |
| stringency_index_120_days_after_fifth_death | |
| stringency_index_125_days_after_fifth_death | |
| stringency_index_130_days_after_fifth_death | |
| stringency_index_135_days_after_fifth_death | |
| stringency_index_140_days_after_fifth_death | |
| stringency_index_145_days_after_fifth_death | |
| stringency_index_150_days_after_fifth_death |
Una visualización preliminar de estos datos revela que son todos del tipo string y que además muchas columnas sin datos (columnas cuyo único contenido es NULL), por lo tanto llevaremos a cabo una limpieza de los mismos además de cambios de tipo de variables para que las manipulaciones posteriores sean más cómodas. Los detalles se muestran en el siguiente bloque de código:
# Limpiar datos de BCG
# Elimino columnas que sean sólo NA
BCG_strain <- BCG_strain[, apply(!is.na(BCG_strain), 2, all)]
# De momento, no me interesa qué vacunas se ponían cada año, sino si se ponían o no.
# Transformo los valores de cada año en
# 0 - No se ponía vacuna, hasta ahora None
# 1 - Sí se ponía vacuna
# NA - Este dato es desconocido, hasta ahora Unknown
BCG_strain_no_strain <- BCG_strain
# Transformo los valores de las columnas
BCG_strain_no_strain[, -1] <-
sapply(BCG_strain_no_strain[, -1], function(x) {
a <-
gsub("None", 0, x) %>% gsub("Unknown", NA, .) # Añado los 0 y los NA.
for (i in 1:length(a)) {
# Serán 1 aquellos que no sean ni 0 ni NA
if (a[i] != "0" && !is.na(a[i])) {
a[i] <- 1
}
}
return(as.integer(a)) # Cambio las columnas a integer
})
####################################################################################
# Limpiar datos de COVID
# Elimino columnas que sean sólo NA
COVID_noNA <- COVID_noformat[, apply(!is.na(COVID_noformat), 2, all)]
# En este caso, para variar, los valores vacíos están denotados como NULL,
# cambio esto a NA
COVID_Na <- sapply(COVID_noNA, function(x)
gsub("NULL", NA, x))
# El resulatado de la función anterior es una string. Lo convierto a dataframe.
COVID_Na_df <- as.data.frame(COVID_Na)
# Modifico las fechas para que se almacenen como Date
COVID_Na_df[, c("date_fifth_death")] <-
as.Date(COVID_Na_df[, c("date_fifth_death")], "%d/%m/%y")
COVID_Na_df[, c("date_first_death")] <-
as.Date(COVID_Na_df[, c("date_first_death")], "%d/%m/%y")
# Modifico las muertes para que se almacenen como floats.
COVID_Na_df[, -c(1, 2, 3, 4)] <-
sapply(COVID_Na_df[, -c(1, 2, 3, 4)], as.numeric)
################################################################################################
# Junto ambos dataframes en uno sólo.
COVID_BGC <-
left_join(BCG_strain_no_strain, COVID_Na_df, by = "country_name")
# Reduzco los colnames, son my largos
colnames(COVID_BGC) <-
gsub("mandatory_bcg_strain_", "strain", colnames(COVID_BGC)) %>%
gsub("deaths_per_million", "dpm", .) %>%
gsub("days_after_fifth_death", "d", .) %>%
gsub("stringency_index", "si", .)
Nuestra tabla resultante es la siguiente:
| country_name | strain2015-2020 | strain2010-2015 | strain2005-2010 | strain2000-2005 | strain1990-2000 | strain1980-1990 | strain1970-1980 | strain1960-1970 | strain1950-1960 | alpha_3_code |
|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | AFG |
| Albania | 0 | 0 | ALB | |||||||
| Algeria | 1 | 0 | 0 | DZA | ||||||
| Angola | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | AGO |
| Argentina | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | ARG |
| Armenia | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | ARM |
| country_name | date_first_death | date_fifth_death | dpm_10_d | dpm_15_d | dpm_20_d | dpm_25_d | dpm_30_d | dpm_35_d | dpm_40_d | dpm_45_d |
|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan | 2020-03-24 | 2020-04-05 | 48 | 47 | 41 | 44 | 79 | 80 | 102 | 118 |
| Albania | 2020-03-12 | 2020-03-25 | 155 | 161 | 159 | 169 | 168 | 41 | 35 | 33 |
| Algeria | 2020-03-13 | 2020-03-18 | 49 | 66 | 125 | 153 | 161 | 167 | 167 | 32 |
| Angola | 2020-03-30 | 2020-06-12 | 21 | 17 | 20 | 20 | 21 | 19 | 15 | 19 |
| Argentina | 2020-03-08 | 2020-03-25 | 59 | 68 | 91 | 91 | 116 | 135 | 142 | 153 |
| Armenia | 2020-03-27 | 2020-04-03 | 144 | 155 | 162 | 56 | 49 | 55 | 53 | 71 |
| country_name | dpm_50_d | dpm_55_d | dpm_60_d | dpm_65_d | dpm_70_d | dpm_75_d | dpm_80_d | dpm_85_d | dpm_90_d | dpm_95_d |
|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan | 134 | 146 | 155 | 165 | 19 | 30 | 37 | 43 | 51 | 58 |
| Albania | 32 | 25 | 23 | 25 | 18 | 22 | 22 | 24 | 26 | 39 |
| Algeria | 33 | 30 | 27 | 31 | 31 | 32 | 36 | 39 | 36 | 41 |
| Angola | 22 | 22 | 56 | 59 | 55 | 78 | 77 | 81 | 79 | 98 |
| Argentina | 160 | 166 | 167 | 24 | 21 | 31 | 42 | 57 | 57 | 65 |
| Armenia | 75 | 109 | 132 | 146 | 161 | 19 | 25 | 28 | 29 | 33 |
| country_name | dpm_100_d | dpm_105_d | dpm_110_d | dpm_115_d | dpm_120_d | dpm_125_d | dpm_130_d | dpm_135_d | dpm_140_d | dpm_145_d |
|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan | 54 | 64 | 67 | 74 | 72 | 70 | 71 | 74 | 75 | 77 |
| Albania | 51 | 57 | 71 | 84 | 96 | 106 | 104 | 122 | 126 | 128 |
| Algeria | 44 | 45 | 46 | 53 | 50 | 54 | 55 | 61 | 63 | 69 |
| Angola | 99 | |||||||||
| Argentina | 79 | 83 | 97 | 106 | 111 | 123 | 129 | 141 | 10 | 25 |
| Armenia | 37 | 46 | 48 | 57 | 56 | 55 | 57 | 62 | 65 | 67 |
| country_name | dpm_150_d | si_10_d | si_15_d | si_20_d | si_25_d | si_30_d | si_35_d | si_40_d | si_45_d | si_50_d |
|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan | 80 | 52 | 47 | 42 | 45 | 45 | 47 | 48 | 56 | 56 |
| Albania | 132 | 52 | 47 | 42 | 54 | 53 | 53 | 55 | 62 | 62 |
| Algeria | 73 | 39 | 50 | 46 | 49 | 56 | 56 | 35 | 41 | 43 |
| Angola | 40 | 34 | 32 | 35 | 35 | 35 | 34 | 40 | 47 | |
| Argentina | 32 | 1 | 1 | 1 | 1 | 1 | 52 | 54 | 61 | 63 |
| Armenia | 70 |
| country_name | si_55_d | si_60_d | si_65_d | si_70_d | si_75_d | si_80_d | si_85_d | si_90_d | si_95_d | si_100_d |
|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan | 62 | 67 | 60 | 62 | 61 | 61 | 64 | 69 | 64 | 68 |
| Albania | 60 | 66 | 71 | 41 | 41 | 41 | 44 | 45 | 42 | 46 |
| Algeria | 66 | 54 | 56 | 59 | 57 | 57 | 60 | 46 | 43 | 48 |
| Angola | 53 | 59 | 61 | 58 | 56 | 56 | 68 | 72 | 66 | |
| Argentina | 69 | 74 | 76 | 77 | 71 | 71 | 76 | 79 | 73 | 76 |
| Armenia |
| country_name | si_105_d | si_110_d | si_115_d | si_120_d | si_125_d | si_130_d | si_135_d | si_140_d | si_145_d | si_150_d |
|---|---|---|---|---|---|---|---|---|---|---|
| Afghanistan | 67 | 63 | 66 | 60 | 52 | 53 | 27 | 30 | 5 | 3 |
| Albania | 41 | 42 | 42 | 37 | 36 | 36 | 31 | 35 | 36 | 35 |
| Algeria | 49 | 48 | 60 | 56 | 57 | 56 | 66 | 68 | 73 | 53 |
| Angola | ||||||||||
| Argentina | 76 | 72 | 77 | 71 | 72 | 72 | 75 | 79 | 77 | 77 |
| Armenia |
Mediante esta tabla llevaremos a cabo nuestros análisis. A continuación mostramos la estructura de la misma:
str(COVID_BGC)
## tibble [184 x 71] (S3: tbl_df/tbl/data.frame)
## $ country_name : chr [1:184] "Afghanistan" "Albania" "Algeria" "Angola" ...
## $ strain2015-2020 : int [1:184] 1 NA 1 1 0 1 1 0 NA 0 ...
## $ strain2010-2015 : int [1:184] 1 NA NA 1 0 1 1 0 NA 0 ...
## $ strain2005-2010 : int [1:184] 1 NA NA 1 1 1 1 0 NA NA ...
## $ strain2000-2005 : int [1:184] 1 NA NA 1 1 1 1 0 NA NA ...
## $ strain1990-2000 : int [1:184] 1 NA NA 1 1 0 1 0 NA NA ...
## $ strain1980-1990 : int [1:184] 1 NA NA 1 0 0 1 NA NA NA ...
## $ strain1970-1980 : int [1:184] 0 NA NA 0 0 0 1 NA NA NA ...
## $ strain1960-1970 : int [1:184] 0 0 0 0 0 0 1 NA NA NA ...
## $ strain1950-1960 : int [1:184] 0 0 0 0 0 0 1 NA NA NA ...
## $ alpha_3_code : Factor w/ 211 levels "ABW","AFG","AGO",..: 2 5 56 3 8 9 11 12 13 21 ...
## $ date_first_death: Date[1:184], format: "2020-03-24" "2020-03-12" ...
## $ date_fifth_death: Date[1:184], format: "2020-04-05" "2020-03-25" ...
## $ dpm_10_d : num [1:184] 48 155 49 21 59 144 37 173 68 141 ...
## $ dpm_15_d : num [1:184] 47 161 66 17 68 155 45 95 64 137 ...
## $ dpm_20_d : num [1:184] 41 159 125 20 91 162 47 127 59 138 ...
## $ dpm_25_d : num [1:184] 44 169 153 20 91 56 85 137 86 133 ...
## $ dpm_30_d : num [1:184] 79 168 161 21 116 49 82 144 81 140 ...
## $ dpm_35_d : num [1:184] 80 41 167 19 135 55 79 154 77 148 ...
## $ dpm_40_d : num [1:184] 102 35 167 15 142 53 94 157 95 159 ...
## $ dpm_45_d : num [1:184] 118 33 32 19 153 71 93 158 94 160 ...
## $ dpm_50_d : num [1:184] 134 32 33 22 160 75 92 156 111 164 ...
## $ dpm_55_d : num [1:184] 146 25 30 22 166 109 94 154 127 28 ...
## $ dpm_60_d : num [1:184] 155 23 27 56 167 132 95 150 140 35 ...
## $ dpm_65_d : num [1:184] 165 25 31 59 24 146 92 156 152 51 ...
## $ dpm_70_d : num [1:184] 19 18 31 55 21 161 96 152 163 69 ...
## $ dpm_75_d : num [1:184] 30 22 32 78 31 19 102 152 21 100 ...
## $ dpm_80_d : num [1:184] 37 22 36 77 42 25 99 155 31 114 ...
## $ dpm_85_d : num [1:184] 43 24 39 81 57 28 98 154 40 127 ...
## $ dpm_90_d : num [1:184] 51 26 36 79 57 29 95 150 49 127 ...
## $ dpm_95_d : num [1:184] 58 39 41 98 65 33 95 151 57 137 ...
## $ dpm_100_d : num [1:184] 54 51 44 99 79 37 94 149 60 145 ...
## $ dpm_105_d : num [1:184] 64 57 45 NA 83 46 92 145 75 143 ...
## $ dpm_110_d : num [1:184] 67 71 46 NA 97 48 86 142 76 145 ...
## $ dpm_115_d : num [1:184] 74 84 53 NA 106 57 89 147 88 153 ...
## $ dpm_120_d : num [1:184] 72 96 50 NA 111 56 86 143 98 149 ...
## $ dpm_125_d : num [1:184] 70 106 54 NA 123 55 89 137 97 147 ...
## $ dpm_130_d : num [1:184] 71 104 55 NA 129 57 98 130 97 146 ...
## $ dpm_135_d : num [1:184] 74 122 61 NA 141 62 115 133 102 11 ...
## $ dpm_140_d : num [1:184] 75 126 63 NA 10 65 132 130 103 16 ...
## $ dpm_145_d : num [1:184] 77 128 69 NA 25 67 7 132 106 18 ...
## $ dpm_150_d : num [1:184] 80 132 73 NA 32 70 25 129 107 15 ...
## $ si_10_d : num [1:184] 52 52 39 40 1 NA 28 53 57 39 ...
## $ si_15_d : num [1:184] 47 47 50 34 1 NA 27 48 57 33 ...
## $ si_20_d : num [1:184] 42 42 46 32 1 NA 27 43 54 30 ...
## $ si_25_d : num [1:184] 45 54 49 35 1 NA 30 42 56 33 ...
## $ si_30_d : num [1:184] 45 53 56 35 1 NA 29 42 50 32 ...
## $ si_35_d : num [1:184] 47 53 56 35 52 NA 24 38 51 33 ...
## $ si_40_d : num [1:184] 48 55 35 34 54 NA 25 38 50 32 ...
## $ si_45_d : num [1:184] 56 62 41 40 61 NA 31 26 57 38 ...
## $ si_50_d : num [1:184] 56 62 43 47 63 NA 32 28 44 40 ...
## $ si_55_d : num [1:184] 62 60 66 53 69 NA 37 23 50 46 ...
## $ si_60_d : num [1:184] 67 66 54 59 74 NA 37 23 56 51 ...
## $ si_65_d : num [1:184] 60 71 56 61 76 NA 39 22 70 52 ...
## $ si_70_d : num [1:184] 62 41 59 58 77 NA 38 21 60 55 ...
## $ si_75_d : num [1:184] 61 41 57 56 71 NA 34 19 67 54 ...
## $ si_80_d : num [1:184] 61 41 57 56 71 NA 33 15 59 54 ...
## $ si_85_d : num [1:184] 64 44 60 68 76 NA 28 17 79 57 ...
## $ si_90_d : num [1:184] 69 45 46 72 79 NA 24 21 82 51 ...
## $ si_95_d : num [1:184] 64 42 43 66 73 NA 23 20 76 48 ...
## $ si_100_d : num [1:184] 68 46 48 NA 76 NA 30 21 78 54 ...
## $ si_105_d : num [1:184] 67 41 49 NA 76 NA 33 22 78 55 ...
## $ si_110_d : num [1:184] 63 42 48 NA 72 NA 53 25 74 54 ...
## $ si_115_d : num [1:184] 66 42 60 NA 77 NA 62 12 78 54 ...
## $ si_120_d : num [1:184] 60 37 56 NA 71 NA 53 11 72 50 ...
## $ si_125_d : num [1:184] 52 36 57 NA 72 NA 46 11 73 49 ...
## $ si_130_d : num [1:184] 53 36 56 NA 72 NA 47 10 68 50 ...
## $ si_135_d : num [1:184] 27 31 66 NA 75 NA 50 12 72 53 ...
## $ si_140_d : num [1:184] 30 35 68 NA 79 NA 67 11 76 44 ...
## $ si_145_d : num [1:184] 5 36 73 NA 77 NA 64 12 73 45 ...
## $ si_150_d : num [1:184] 3 35 53 NA 77 NA 64 11 73 44 ...
a) ¿Cuál es la media para la variable “deaths per million 150 days after fifth death” (representado por `dpm_150_d) para el conjunto de países que la han medido? ¿Y para la variable “stringency index 150 days after fifth death”?
m1 <- mean(na.omit(COVID_BGC$dpm_150_d))
m2 <- mean(na.omit(COVID_BGC$si_150_d))
cat(sprintf("La media para dpm_150_d es %.2f\n", m1))
## La media para dpm_150_d es 69.98
cat(sprintf("La media para si_150_d es %.2f\n", m2))
## La media para si_150_d es 39.89
b) ¿Cuáles son los países cuyo valor para “deaths per million 150 days after fifth death” (en caso de estar presente) se encuentra por debajo de la media? ¿Y para la variable “stringency index 150 days after fifth death”?
cat(
"Los países cuyo valor de dpm_150_d es menor que la media del dataset son:\n\n",
paste(subset(COVID_BGC, dpm_150_d < m1)$country_name, collapse = ', '),
"\n\n"
)
## Los países cuyo valor de dpm_150_d es menor que la media del dataset son:
##
## Argentina, Australia, Bahrain, Bangladesh, Barbados, Bosnia and Herzegovina, Burkina Faso, Cameroon, Canada, Costa Rica, Denmark, Dominican Republic, El Salvador, Greece, Guam, Guatemala, Haiti, Honduras, Indonesia, Iraq, Jamaica, Jordan, Kazakhstan, Kenya, Kuwait, Latvia, Liberia, Moldova, Montenegro, Morocco, Myanmar, Niger, Oman, Pakistan, Philippines, Portugal, Romania, Saudi Arabia, Senegal, Serbia, South Africa, Sri Lanka, Sudan, Switzerland, Taiwan, Tanzania, Thailand, Uruguay, Uzbekistan, Venezuela
cat(
"Los países cuyo valor de si_150_d es menor que la media del dataset son:\n\n",
paste(subset(COVID_BGC, si_150_d < m2)$country_name, collapse = ', ')
)
## Los países cuyo valor de si_150_d es menor que la media del dataset son:
##
## Afghanistan, Albania, Austria, Barbados, Bosnia and Herzegovina, Bulgaria, Burkina Faso, Cameroon, Cote d'Ivoire, Croatia, Czech Republic, Denmark, Estonia, Finland, France, Ghana, Greece, Hungary, Israel, Japan, Jordan, Latvia, Lithuania, Malaysia, Mali, Mauritius, Moldova, Netherlands, New Zealand, Niger, Norway, Pakistan, Poland, Romania, Singapore, Slovenia, Sri Lanka, Sweden, Switzerland, Taiwan, Tanzania, Thailand, Togo, Tunisia, Turkey, Ukraine, United Arab Emirates, Uruguay
c) ¿Cuáles son los países que cumplen ambas condiciones del apartado anterior?
cat(
"Los países cuyos valores de dpm_150_d y de si_150_d son menores que la media del dataset son:\n\n",
paste(subset(COVID_BGC, (dpm_150_d < m1) &
(si_150_d < m2))$country_name, collapse = ', ')
)
## Los países cuyos valores de dpm_150_d y de si_150_d son menores que la media del dataset son:
##
## Barbados, Bosnia and Herzegovina, Burkina Faso, Cameroon, Denmark, Greece, Jordan, Latvia, Moldova, Niger, Pakistan, Romania, Sri Lanka, Switzerland, Taiwan, Tanzania, Thailand, Uruguay
d) ¿Cuáles son los países que han tenido campaña de vacunación de la vacuna BCG más reciente y que su la media de mortalidad a los 150 días es menor que la media?
cat(
"Los países cuyos valores de dpm_150_d son menores que la media del dataset y que además han tenido una reciente campaña de vacunación de la vacuna BCG son:\n\n",
paste(subset(
COVID_BGC, (dpm_150_d < m1) &
(`strain2015-2020` == 1)
)$country_name, collapse = ', ')
)
## Los países cuyos valores de dpm_150_d son menores que la media del dataset y que además han tenido una reciente campaña de vacunación de la vacuna BCG son:
##
## Australia, Bangladesh, Bosnia and Herzegovina, El Salvador, Indonesia, Kazakhstan, Kenya, Kuwait, Latvia, Moldova, Pakistan, Philippines, Portugal, Romania, Saudi Arabia, Senegal, South Africa, Sudan, Taiwan, Tanzania, Thailand
a) Resumen estadístico de las variables
# Obvimente hay variables en las que no tiene sentido hacer resumen estadístico, como el alpha_3_code, las strains... Pero por ahora lo voy a dejar
summary(COVID_BGC)
## country_name strain2015-2020 strain2010-2015 strain2005-2010
## Length:184 Min. :0.0000 Min. :0.0000 Min. :0.0000
## Class :character 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1.0000
## Mode :character Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.6907 Mean :0.7021 Mean :0.7527
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :87 NA's :90 NA's :91
## strain2000-2005 strain1990-2000 strain1980-1990 strain1970-1980
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.0000 1st Qu.:1.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :1.0000 Median :1.0000
## Mean :0.7727 Mean :0.7647 Mean :0.7262 Mean :0.5341
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## NA's :96 NA's :99 NA's :100 NA's :96
## strain1960-1970 strain1950-1960 alpha_3_code date_first_death
## Min. :0.0000 Min. :0.0000 AFG : 1 Min. :2020-01-11
## 1st Qu.:0.0000 1st Qu.:0.0000 AGO : 1 1st Qu.:2020-03-16
## Median :0.0000 Median :0.0000 ALB : 1 Median :2020-03-24
## Mean :0.4456 Mean :0.2917 ARE : 1 Mean :2020-03-30
## 3rd Qu.:1.0000 3rd Qu.:1.0000 ARG : 1 3rd Qu.:2020-04-03
## Max. :1.0000 Max. :1.0000 (Other):148 Max. :2020-08-01
## NA's :92 NA's :88 NA's : 31 NA's :40
## date_fifth_death dpm_10_d dpm_15_d dpm_20_d
## Min. :2020-01-21 Min. : 2.00 Min. : 1.00 Min. : 1.00
## 1st Qu.:2020-03-25 1st Qu.: 41.25 1st Qu.: 40.25 1st Qu.: 35.25
## Median :2020-04-03 Median : 82.50 Median : 82.50 Median : 83.50
## Mean :2020-04-16 Mean : 85.30 Mean : 86.23 Mean : 82.21
## 3rd Qu.:2020-04-22 3rd Qu.:130.75 3rd Qu.:131.50 3rd Qu.:125.75
## Max. :2020-08-29 Max. :174.00 Max. :175.00 Max. :169.00
## NA's :42 NA's :42 NA's :42 NA's :42
## dpm_25_d dpm_30_d dpm_35_d dpm_40_d
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 38.00 1st Qu.: 37.75 1st Qu.: 38.50 1st Qu.: 35.50
## Median : 83.00 Median : 82.50 Median : 80.00 Median : 80.00
## Mean : 85.16 Mean : 84.56 Mean : 82.42 Mean : 82.17
## 3rd Qu.:131.00 3rd Qu.:132.25 3rd Qu.:125.00 3rd Qu.:126.50
## Max. :171.00 Max. :172.00 Max. :170.00 Max. :169.00
## NA's :43 NA's :44 NA's :45 NA's :45
## dpm_45_d dpm_50_d dpm_55_d dpm_60_d
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 38.50 1st Qu.: 37.25 1st Qu.: 40.00 1st Qu.: 41.00
## Median : 83.00 Median : 85.50 Median : 88.00 Median : 86.00
## Mean : 84.71 Mean : 83.64 Mean : 85.61 Mean : 85.61
## 3rd Qu.:130.00 3rd Qu.:127.75 3rd Qu.:130.00 3rd Qu.:129.00
## Max. :171.00 Max. :169.00 Max. :170.00 Max. :170.00
## NA's :45 NA's :46 NA's :47 NA's :47
## dpm_65_d dpm_70_d dpm_75_d dpm_80_d
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 38.75 1st Qu.: 40.75 1st Qu.: 39.25 1st Qu.: 42.25
## Median : 83.50 Median : 82.50 Median : 82.50 Median : 84.50
## Mean : 83.43 Mean : 83.44 Mean : 81.66 Mean : 83.92
## 3rd Qu.:127.25 3rd Qu.:127.75 3rd Qu.:123.75 3rd Qu.:127.75
## Max. :168.00 Max. :166.00 Max. :165.00 Max. :166.00
## NA's :48 NA's :50 NA's :50 NA's :50
## dpm_85_d dpm_90_d dpm_95_d dpm_100_d
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 44.50 1st Qu.: 44.25 1st Qu.: 41.50 1st Qu.: 42.25
## Median : 86.50 Median : 85.00 Median : 82.00 Median : 80.00
## Mean : 84.74 Mean : 84.60 Mean : 81.91 Mean : 80.65
## 3rd Qu.:127.75 3rd Qu.:126.75 3rd Qu.:124.00 3rd Qu.:121.75
## Max. :166.00 Max. :166.00 Max. :163.00 Max. :160.00
## NA's :50 NA's :50 NA's :53 NA's :54
## dpm_105_d dpm_110_d dpm_115_d dpm_120_d
## Min. : 1.0 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 42.5 1st Qu.: 42.50 1st Qu.: 40.00 1st Qu.: 41.00
## Median : 80.0 Median : 82.00 Median : 81.00 Median : 79.00
## Mean : 79.8 Mean : 80.22 Mean : 79.34 Mean : 78.19
## 3rd Qu.:119.5 3rd Qu.:119.75 3rd Qu.:120.00 3rd Qu.:117.00
## Max. :157.0 Max. :157.00 Max. :154.00 Max. :151.00
## NA's :57 NA's :58 NA's :59 NA's :63
## dpm_125_d dpm_130_d dpm_135_d dpm_140_d
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 40.50 1st Qu.: 37.75 1st Qu.: 38.00 1st Qu.: 37.75
## Median : 78.50 Median : 76.50 Median : 78.00 Median : 74.50
## Mean : 77.33 Mean : 75.31 Mean : 76.03 Mean : 71.37
## 3rd Qu.:115.25 3rd Qu.:112.50 3rd Qu.:114.00 3rd Qu.:106.25
## Max. :149.00 Max. :146.00 Max. :144.00 Max. :138.00
## NA's :64 NA's :66 NA's :67 NA's :72
## dpm_145_d dpm_150_d si_10_d si_15_d
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.: 36.75 1st Qu.: 35.75 1st Qu.:29.25 1st Qu.:28.25
## Median : 74.00 Median : 73.50 Median :43.50 Median :42.00
## Mean : 70.38 Mean : 69.98 Mean :40.45 Mean :38.16
## 3rd Qu.:103.25 3rd Qu.:104.25 3rd Qu.:56.00 3rd Qu.:52.00
## Max. :134.00 Max. :134.00 Max. :66.00 Max. :62.00
## NA's :76 NA's :76 NA's :50 NA's :50
## si_20_d si_25_d si_30_d si_35_d
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.:25.00 1st Qu.:28.00 1st Qu.:26.50 1st Qu.:25.00
## Median :38.00 Median :41.00 Median :41.00 Median :41.00
## Mean :35.02 Mean :37.46 Mean :36.41 Mean :36.76
## 3rd Qu.:48.00 3rd Qu.:51.00 3rd Qu.:49.50 3rd Qu.:50.00
## Max. :58.00 Max. :60.00 Max. :60.00 Max. :61.00
## NA's :52 NA's :52 NA's :53 NA's :53
## si_40_d si_45_d si_50_d si_55_d
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.:21.00 1st Qu.:25.00 1st Qu.:26.00 1st Qu.:25.00
## Median :39.00 Median :42.00 Median :43.00 Median :46.00
## Mean :35.57 Mean :39.78 Mean :39.39 Mean :43.18
## 3rd Qu.:52.00 3rd Qu.:56.00 3rd Qu.:55.00 3rd Qu.:60.00
## Max. :62.00 Max. :69.00 Max. :68.00 Max. :73.00
## NA's :55 NA's :55 NA's :55 NA's :56
## si_60_d si_65_d si_70_d si_75_d
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.:25.00 1st Qu.:21.25 1st Qu.:25.00 1st Qu.:22.25
## Median :50.00 Median :46.50 Median :43.50 Median :40.50
## Mean :45.02 Mean :44.10 Mean :43.65 Mean :40.44
## 3rd Qu.:64.50 3rd Qu.:65.75 3rd Qu.:64.75 3rd Qu.:59.75
## Max. :78.00 Max. :78.00 Max. :79.00 Max. :75.00
## NA's :57 NA's :58 NA's :58 NA's :58
## si_80_d si_85_d si_90_d si_95_d si_100_d
## Min. : 1.00 Min. : 1.00 Min. : 1.0 Min. : 1.00 Min. : 1.00
## 1st Qu.:21.25 1st Qu.:21.25 1st Qu.:23.0 1st Qu.:21.00 1st Qu.:21.00
## Median :36.50 Median :39.50 Median :43.0 Median :37.50 Median :40.00
## Mean :38.57 Mean :40.72 Mean :42.3 Mean :38.26 Mean :40.39
## 3rd Qu.:57.75 3rd Qu.:60.00 3rd Qu.:63.0 3rd Qu.:56.75 3rd Qu.:60.00
## Max. :75.00 Max. :79.00 Max. :82.0 Max. :76.00 Max. :78.00
## NA's :58 NA's :58 NA's :59 NA's :62 NA's :65
## si_105_d si_110_d si_115_d si_120_d
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.00
## 1st Qu.:22.50 1st Qu.:24.25 1st Qu.:20.50 1st Qu.:19.00
## Median :41.00 Median :42.00 Median :41.00 Median :37.00
## Mean :40.68 Mean :39.60 Mean :40.37 Mean :37.04
## 3rd Qu.:61.00 3rd Qu.:57.00 3rd Qu.:60.00 3rd Qu.:56.00
## Max. :78.00 Max. :74.00 Max. :78.00 Max. :74.00
## NA's :65 NA's :66 NA's :69 NA's :71
## si_125_d si_130_d si_135_d si_140_d si_145_d
## Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. : 1.0 Min. : 1.00
## 1st Qu.:18.00 1st Qu.:18.00 1st Qu.:19.75 1st Qu.:19.5 1st Qu.:20.50
## Median :36.00 Median :36.50 Median :39.00 Median :42.0 Median :41.00
## Mean :36.82 Mean :37.00 Mean :39.56 Mean :41.0 Mean :40.21
## 3rd Qu.:56.00 3rd Qu.:55.25 3rd Qu.:59.25 3rd Qu.:62.0 3rd Qu.:60.00
## Max. :75.00 Max. :74.00 Max. :78.00 Max. :81.0 Max. :78.00
## NA's :71 NA's :72 NA's :76 NA's :77 NA's :81
## si_150_d
## Min. : 1.00
## 1st Qu.:19.25
## Median :40.00
## Mean :39.89
## 3rd Qu.:60.75
## Max. :78.00
## NA's :86
# Igual habría que dibujar histogramas de tasas de muerte, y de vacunas puestas. No sé la verdad.
cormat <-
cor(COVID_BGC %>% select(
-c(
"country_name",
"alpha_3_code",
"date_first_death",
"date_fifth_death",
43:71
)
) %>% na.omit())
cormat2 <- cormat
cormat2[upper.tri(cormat2)] <-
NA #Para visualizar solamente una vez las correlaciones
cormat2 <- melt(round(cormat2, 2)) #Formato para poder usar ggplot
ggplot(cormat2, aes(x = Var1, y = Var2, fill = value)) + geom_tile() + scale_fill_continuous(type = "viridis")
fig <-
plot_ly(
x = colnames(cormat),
y = colnames(cormat),
z = cormat,
type = "heatmap"
)
fig
ggplot(COVID_BGC,
aes(x = dpm_50_d, y = `strain2005-2010`, label = country_name)) +
geom_jitter(position = position_jitter(seed = 1)) +
geom_label_repel(size = 2, position = position_jitter(seed = 1)) +
xlim(c(-100, 800))